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Abstract — We introduce the problem of communication with 
partial information, where there is an asymmetry between the 
transmitter and the receiver codebooks. Practical applications 
of the proposed setup include the robust signal hashing problem 
within the context of multimedia security and asymmetric 
communications with resource-lacking receivers. We study this 
setup in a binary detection theoretic context for the addi- 
tive colored Gaussian noise channel. In our proposed setup, 
the partial information available at the detector consists of 
dimensionality-reduced versions of the transmitter codewords, 
where the dimensionality reduction is achieved via a linear 
transform. We first derive the corresponding MAP-optimal 
detection rule and the corresponding conditional probability 
of error (conditioned on the partial information the detector 
possesses). Then, we constructively quantify an optimal class 
of linear transforms, where the cost function is the expected 
Chernoff bound on the conditional probability of error of the 
MAP-optimal detector. 

I. Introduction 

In this paper, we introduce a communication-theoretic 
paradigm, which we name as "communication with partial 
information", and subsequently study it within a detection- 
theoretic context (therefore the term "detection with partial 
information") in a particular case of the Gaussian setup. 
In the proposed paradigm, there is an inherent asymmetry 
between the information the transmitter and the receiver pos- 
sess in terms of the utilized codebooks. In particular, in the 
"detection with partial information" setup, the codebook of 
the receiver is formed via applying a non-invertible process 
on the codebook of the transmitter; hence the codebooks are 
different. Thus, the information available at the transmitter 
forms a "superset" of the information available at the re- 
ceiver. Note that, a reminiscent asymmetric structure between 
the transmitter and the receiver also exists in the well-known 
family of problems, termed as "communication with side 
information" [1], [2], [3], [4]. However, in the paradigm of 
"communication with side information" (unlike the proposed 
"communication with partial information" setup), the utilized 
codebooks at the receiver and the transmitter are the same; 
in addition, either the transmitter or the receiver is "favored" 
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with the presence of "extra" information (which amounts to 
the "side information"). 

It appears that, there are at least two significant applica- 
tions that motivate the formulation of the "communication 
with partial information" approach: 

* The first application can be viewed to fall within the 
category of "robust signal hashing" in the signal pro- 
cessing & multimedia security literature [5], [6], [7], 
[8]. In robust signal hashing, a content owner provides 
"robust hash value"s of the protected content (that is 
some dimensionality-reduced versions of the protected 
content) to a third party, which searches the content 
using its robust hash values as the partial information 
at the receiver end. These robust hash values repre- 
sent "the content's significant features" and are ideally 
approximately-invariant under acceptable modifications 
to the content. In practical applications, the third party 
that performs the hash-based search is usually not 
trusted; hence, there is a significant issue of privacy. In 
particular, given a robust hash value, it should ideally 
be impossible to retrieve the original protected content 
from a privacy viewpoint. The setup proposed in this 
paper can be used as a detection-theoretic model to 
analyze the hash-based detection problem: the protected 
content is represented by the transmitted signal; the 
robust hash values used in the search are represented 
by the partial information available at the receiver; a 
perceptually-acceptable modification to the protected 
content is represented by the channel noise. 

• The second application includes all instances of point- 
to-point communications, where there is an inherent 
asymmetry between the transmitter and the receiver in 
terms of their storage capabilities and computational 
resources. In particular, the cases, when the receiver 
is unable to store the codebook used by the encoder 
(due to a limit on the memory) or utilize the codebook 
used by the encoder (due to a limit on the computa- 
tional resources), can be studied within the framework 
of "communication with partial information". In such 
cases, one potential remedy is the receiver's using a 
"simplified" (i.e., dimensionality-reduced) version of 
the codebook of the encoder. In practice, such situations 
may typically arise, for instance, when there is a bi- 
directional communication between a sensor and the 
base station (the resource-limited receiver representing 
the sensor) or when there is a bi-directional communi- 
cation between a controller and a remote measurement 
unit. In such applications, the simplified version of 



the encoder codebook is represented by the partial 
information at the receiver side. 

Our contributions in this paper can be listed as follows: 

• We introduce the paradigm of "communication with 
partial information" and study it within the context 
of binary detection in the Gaussian setup. We believe 
the main philosophy behind this formulation (i.e., in- 
troducing an asymmetry between the transmitter and 
the receiver in the sense of utilized codebooks) can 
be used to analyze various problems of interest in 
communication theory and signal processing. 

• Within the binary hypothesis testing setup, we study a 
case, where the disturbance on the transmitter output 
consists of additive colored Gaussian noise, and the 
detector partial information is produced via applying 
a linear (dimensionality-reducing) transform on the en- 
coder codebook. Consequently, we present the follow- 
ing results: 

- We derive the MAP-optimal detection rule and the 
corresponding probability of error, both of which 
are conditioned on the partial information available 
at the detector. 

- We construct a class of optimal linear transforms, 
which minimize the expected (with respect to the 
joint distribution of the detector partial information) 
Chernoff bound on the aforementioned probability 
of detection error. 

In Sec. HIl we present the notation that is used throughout 
the paper and specify the formal problem statement. In 
Sec. |nll we derive the MAP-optimal detection rule condi- 
tioned on the partial information available at the receiver. In 
Sec. lIVI we quantify an optimal (in the sense of the expected 
value of the Chernoff bound on the detection error probabil- 
ity) class of linear transforms that are used to generate the 
receiver partial information. We present illustrative numerical 
results in Sec. El followed by discussions and conclusion in 
Sec. ED 



unique (up to ordering) and defined as 

A = UAV T , (2.1) 

where U e W nxk , V 6 R nxk , A e R kxk are called the 
left-singular vector matrix (orthonormal), the right-singular 
vector matrix (orthonormal) and the singular value matrix of 

A, respectively. The matrix A is positive-definite diagonal; 
we denote its entries along the diagonal by {o~i (A)} . =1 , 
which are the non-zero singular values of A, and assumed 
to be in non-increasing order without loss of generality. 

For a square matrix A of size k x k and of rank r < k, 
{Xi (A)}[ =1 denote its non-zero eigenvalues; in case A is a 
symmetric matrix, {A^} are assumed to be in non-decreasing 
order. We use Af (n, S) to denote a multivariate Gaussian 
distribution, with mean vector fj, and covariance matrix 
S. Furthermore, Q (•) denotes the standard Q-function: 
Q( a ) £ f°° i e- x2 ' 2 dx. 

B. Problem Statement 

We analyze a binary communication system, where the 
encoder selects one of the two codewords, Xo and xi, 
representing the message bit % e {0, 1}, where Pr (i = 0) = 
Pr(i = l) = 1/2; the selected codeword, x = x$, is 
sent through a channel. The encoder output x is corrupted 
by an additive, signal-independent, (not necessarily white) 
Gaussian noise, denoted by e, thereby yielding the overall 
channel output y. Observing y, the receiver acts as a detector 
and makes a binary decision, as to the origins of received 
signal. We pursue a detection-theoretic approach to solve this 
problem and assume uniform costs. We assume that xo, xi, 
e, and y are all length-n real-valued vectors, where Xo and 
Xi are independent of each other and xrj,xi ~ A/*(0, 
e ~ M (0, S e ) is independent of both xo and xi. Here, we 
also assume that the covariance matrix of the original signals 
S x and the covariance matrix of the noise S e are positive 
definite (they are also symmetric by construction). See Fig.Q] 
for a schematic illustration of the proposed problem. 



II. Notation and Problem Statement 
A. Notation 

Boldface lowercase and uppercase letters denote vectors 
and matrices, respectively; the corresponding regular let- 
ters with subscripts denote their individual elements. For 
instance, given a vector a, a; represents its z-th element; 
given a matrix A, Ay denotes its (i,j)-th element. Note 
that, we do not use a separate notation for random vectors; 
we assume that it is clear from the context. 

Given a matrix A, A T , r (A) and det(A) denote its 
transpose, rank and determinant, respectively; further, I„ 
denotes the identity matrix of size n x n. Given the vectors 
x, y G R m , (x, y) indicates the inner product that induces 
the Euclidean norm, i.e., (x, y) = 2~2i x iVi' accordingly the 
induced Euclidean norm is denoted by ||x|| = (xjX) 1 / 2 . 

Definition 2.1: Given AeR"", such that r (A) = k < 
min (m,n), Singular Value Decomposition (SVD) of A is 



Received 

o-. ( Encoder ~\ Signal /• Detector \ , 

Message Bit , x x-x x _ v . , i e{0,l} 

»■ f Codebook! -*-( + ) - fr- fcodebookl fc. 

Noise v ' 

e~A r (0,I e ) Zi-Tx,, i.o,i 

Fig. 1. Block diagram representation of the problem of "binary detection 
with partial information". 

In the considered setup, the detector does not know the 
original codewords {xo , Xi }, but only their distributions 
and their dimensionality-reduced versions, {zo,Zi}, where 
Z{ = T • Xj, i = 0, 1, and T is a deterministic real matrix of 
size m x n, m < n, r (T) = m. Note that, this implies, Zrj 
and zi are both length-m real-valued vectors. As such, the 
proposed problem is radically different from the conventional 
binary detection scenario due to the mismatch between the 
codebooks of the encoder and the detector. Consequently, 



we term the problem at hand as "detection with partial 
information" for the Gaussian case. 

An important point here is that, since the receiver fully 
knows the statistical characterization of the whole system, 
it is able to apply the MAP decoding rule. In particular, in 
Sec. Hill we derive the MAP detection rule, which is given 
as a function of the partial information (zq,zi), and the 
corresponding conditional probability of error (conditioned 
on zq and zi). Subsequently, in Sec. II VI we derive the 
optimal linear transform, T, in the sense of the expected 
Chernoff bound on the conditional probability of error of 
the MAP detector. 

Remark 2.1: In [9], the authors study a closely-related 
problem, which can be viewed as the "deterministic variant" 
of the aforementioned setup. In particular, in [9] the authors 
assume that the encoder codewords {x^} are deterministic, 
unknown and the subsequent analysis is based on the proba- 
bility of error induced by the GLRT (generalized likelihood 
ratio test) rule. On the other hand, in this paper, we assume 
that the encoder codewords {x^} are random (in particular 
Gaussian) and perform a MAP-based analysis. 

Remark 2.2: Although the problem imposed in this paper 
is the binary detection case, the analysis can be extended 
to apply a "union bound based approach" for the i-ary 
case with little or no difficult^. A similar approach and 
discussion was provided in [9] for the case of deterministic 

{*,•}■ 

III. Optimal Detection Conditioned On The 
Partial Information 

At the detector side, we are given {zo,zi}, which yield 
partial information about the true codewords {xo,Xi}. The 
binary hypothesis testing approach on the detector side uti- 
lizes the MAP detection rule [10]: It operates on the observed 
data y (generated by the process explained in Sec. III-Bb . 
and makes a binary decision regarding the message bit 
given {zo,Zi}. Thus, we aim to solve the following binary 
hypothesis testing problem: 



Ho 
Hi 



y = x 
y = x x 



given {z ,zi}, 
given {z ,zi}. 



The corresponding MAP detection rule is given by 



Ho 



p(y\H ) £p(y|i?i). 



(3.1) 



since we have equal priors and uniform costs. Note that, ( 13. U 
is also known as the maximum-likelihood detection rule [10]. 
Note that, for all i G {0, 1}, we have 

P(y I Hi) = p (xj + e I , 

Xi+e=y 

which implies that ( 13. U can be rewritten as 



Ho 



p{x +e|z )| Xo+e=y ^ p(xi +e|zi)| Xi+e=y 

Hi 



(3.2) 



'in the L-ary case, the message is logL bits long; the encoder and 
receiver codebooks are and {zi}f~Q, respectively. 



Theorem 3.1: The maximum likelihood detection 
rule (13.21 i is given by 



-1/2 



Hi 



y - a*i«)|«o) II ^ ll s vu /2 ( y ~ ^11*1) II (33) 



v\* 



H 



The corresponding (conditional) probability of error (condi- 
tioned on zq and zi) is given by 



-1/2 



-fe|z ,zi Q 



where, for i S {0,1}, 



,ASo|*o ASiIzi) II 



(3.4) 



E (y* l z i)l v ,= 



yi=Xi+e 



E^T (TX X T J Zj; S y i z is positive definite and given 

by E„| 2 = Cov(yi|zi)| y . =x . +e)i=0il = Y, x + S e - 

S,T T (TE^T^TE,. 

Proof: See Appendix U ■ 
Remark 3.1: Using Theorem 13. 11 we see that, if Zq = Z\, 
conditional probability of error is 1/2, which is meaningful. 
Then, there is nothing to discriminate from the detector's 
perspective thereby converting the detection to a fair coin 
toss. 

Remark 3.2: The argument of the Q-function in ( 13.4b is 
always non-negative. This allows us to set a tight bound on 
the expected probability of error, and analyze it in Sec. [IV] 

IV. Optimal Linear Operators In The Expectation 

Sense 

In this section, our performance criterion is based on the 
expected (unconditional) probability of error of the MAP 
detector, denoted by P e , given by 

Pe — E{ Zo Zl } [-P e | Zo , Zl ] , 

where E/ ZOiZ1 \ (.) denotes expectation with respect to the 
joint distribution of zo and Zi, and the right hand side follows 
from (l3~4t . 

Remark 4.1 : It appears to be manageable to find a lin- 
ear transform that minimizes the conditional probability of 
error, P e |z n ,zi ( see > f° r instance, [9]) as a function of the 
transmitted signals, xo and Xi, which would yield an "input- 
adaptive optimal transform". On the other hand, the expected 
probability of error given by ( 14.11 ) is not tractable for an 
analogous analysis, carried out to characterize the optimal 
linear transform T that minimizes it. This stems from the 
fact that, such an optimal T would be a function of the 
overall statistics of the system (corresponding to applying 
the operator of E{ Zo Zl j (.) in (14. U ) rather than individual 
realizations, which yields a "complicated" cost function to 
minimize; the result of the expectation operation, i.e., the 
to x TO-fold integration in ( 14. Il l is not given in terms of 
standard analytical functions. Therefore, we continue our 
analysis by characterizing linear operator(s) that minimize 
a tight upper bound on the expected probability of error 
defined by ( 14. j} 



Hence, we proceed with the following approach: We 
first bound P e \ ZOjZ1 for any given pair of {zo,Zi} from 
above and make use of the fact that expected value of this 
upper bound is an upper bound on P e (since, by definition, 
-fe|z ,zi > 0). Also, note that the use of an upper bound 
clearly makes sense since we aim to minimize P e . The 
upper bound on P e \z , Zl that we use is the Chernoff bound 
on the Q-function (see Basic Inequality in [12]), which is 
an exponentially decaying and a sufficiently tight bound. 
The expected Chernoff bound, which replaces the primary 
objective function P e in the design of optimal linear transfom 
T due to its analytical tractability and sufficient tightness, is 
derived in the following proposition. 

Proposition 4.1: The Chernoff bound on P e |z ,zi is 



A 



-1/2 



e|z ,Z] 



f=2 eXP 



1 2 Mj/llzi, 



(4.2) 



yielding the following corresponding "expected Chernoff 
bound" on P e 



P E <\ jdet (l m + iw 



-1/2 



(4.3) 



where W = TS^ST^S^T 71 (TS X T T ) 1 

Proof: See Appendix ITU ■ 

Remark 4.2: The bound on expected (unconditional) 
probability of error of the MAP detector, given by ( 14.31 ) is the 
objective function we aim to minimize in this section. The 
minimization (over T) is carried out over a class of linear 
transformations that posses certain properties imposed by the 
physical structure of the analyzed system. The obvious one 
of these properties is the dimension of the transformation 
(i.e., the fact that T is a to x n matrix); the other one is 
the constraint on its rank (i.e., the fact that r (T) = to). The 
rank constraint is set to ensure that the dimensionality of 
the subspace (which is equal to r (T)), to which the partial 
information shared by the two sides of the communication 
belongs, is at a certain desired level; this is because of 
the following fact: the performance of a system, which 
utilizes a rank-deficient transformation, is analogous to the 
performance of another system, the transformation of which 
is full-rank and has the same rank as the previous rank- 
deficient transformation. 

Definition 4.1: The "expected probability of error bound 
minimizing transform T opt " is given by 



- opt 



argmax 

r(T)=m 



det I 



-W 



(4.4) 



A 



Proposition 4.2: Let 5 T = {T | T e R mxn , r (T) = m}, 

5 M = {M | M e R" xm , M T M = I m }, P = A-!F T {H x 
+S e )FA" 1 . Let the SVD of Y, x and P be given by 
Y, x = FA 2 F T and P = U p A p Uj, respectively, and 



■A-p — \ n A p 



Also define 



G (M) 



A 



n 



i+ 



A 



J (T) = det 



Aj (M T A P M 



W 



Suppose there exists 



M* = argmax G (M) , 

M£S M 



(4.5) 



A 



Then, letting T* = ED (M*) UjA _1 F T , where E E 
R mxm is an arbitrary unitary matrix and D G l mx ™ 
is an arbitrary diagonal positive-definite matrix, we have 
T* = argmax Te5T J (T). 

Proof: See Appendix Hill ■ 
Proposition 14.21 allows us to deduce the existence of T opt 
with the sufficiency of the existence of M*. Then, in order 
to find an optimal linear transformation, which is the main 
goal of this section, we first need to show the existence of 
M*, and then construct T op t using M* that is the solution 
for the reduced problem (14.5I >. 

Proposition 4.3: A set of solutions for ( 14.51 ) is given by 



M 



M e 5m 



M = Q J 







(n— m) xrn 



where T m £ W nxm is a unitary matrix, Q £ {0, l} nx ™ 
denotes a permutation matrix s.t. the eigenvalues of QA p Q T 
are in non-decreasing order. Moreover, 



max 1 
Mes M 1J - 



1 



1 



Aj (M T A p M 



n 



1 



1 



A,- A 



(4.6) 

where IC {1,2, ... ,n} denotes the cardinality-m index set 
corresponding to the TO-smallest eigenvalues of A p . 

Proof: See Appendix IIVI ■ 
Theorem 4.1: A set of optimal linear transforms, in the 
sense of expected Chernoff bound on the probability of 
error P e , for communication with partial information in the 
Gaussian setup is given by 

T = {T G 5 T I T = EDM T UjA _1 F T } (4.7) 

where E G R mxm is unitary, D G R mxm is diagonal, 
M G M, S T = {T G R mxn | r (T) = to}, M is given by 
Proposition 14.31 and F, A and U p denote matrix of eigen- 
vectors and diagonal matrix of eigenvalues of H x and the 
matrix of eigenvectors of P = A _1 F T (E x + S e ) FA -1 , 
respectively. 

Proof: By Proposition 14.21 we know that T ^ 0. 
We also know for a given M*, i.e. M satisfying (14.5b . 
T = EDM*^IJjA- 1 F T satisfies (02), i.e., T = T opt 
(cf. Appendix Hill) . Moreover, a set of M satisfying ( 14.5b . 
namely M., is given by Proposition [43] This clearly implies 
that T, induced by M., is a set of optimal linear transforms, 
in the sense of expected Chernoff bound on the probability 
of error P e . ■ 



V. Numerical Results 



Performance at Optimality for Changing SNR 
n = P5, , = 5 



Optimality q/T*: Theorem 14. 1 1 gives a set of optimal linear 
transforms, however does not address the "denseness" of 
T in St' "is it easy to find an optimal transform in <Sr 
randomly, and how much is the performance of transforms 
in St\1~ separated from that of optimal transforms?". The 
computational provided in Fig. |2] provide an experimental 
basis. In Fig. the simulations are performed with S x and 
S e having uniformly distributed eigenvalues, and the result 
is given using the reciprocal of the Chernoff bound on P e to 
improve visibility. The first observation is that it is not "easy" 
to guess an element of T randomly (we actually simulated 
over much larger number of trials, however give here the 
result for a set of 1000 trials for illustrative purposes). This 
is clear by observing that none of the transforms chosen 
randomly from St achieves the optimal value calculated 



from (14.61 ) in Proposition 14.31 except T opt constructed 
by ( 14.71 ) and indicated as the transform in the middle of 
set of transforms, i.e. Tsoo- Also, the minimum value of the 
bound on P e achieved by arbitrary choices is not even close 
to that achieved by T opt , it is around 4 times larger than the 
minimum bound on P e . Thus, we experimentally conjecture 
that T is not "dense" in St- 
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Fig. 3. Performance of T op t vs. SNR (dB), P e indicates Chernoff bound 
on expected probability of error here 



sense of expected Chernoff bound on P e , Results are shown 
in Fig. [4] As expected, the capability of the detector improves 
as the amount of partial information increases. Also, as m 
tends to n, the performance at optimality converges to that 
for m — n, which is the Gaussian bound (the case when T 
is invertible). 



Reciprocal of the Chernoff Bound on the Probability of Error for Various Linear Transforms 
n - 15, m = 5, E(llxll 2 ) ; E(llell 2 ) = 1 



/P e forT-T 
/p forT = Y 




The Effect of Changing Partial Information Length on the Performance 
n = TT, SNR - T 



P for) =) 



Gaussian s3ound,s3sss'nvurEblu 
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Fig. 2. Performance of Topt compared to arbitrary T £ St 

P e vs. ^(Ilxlj 2 ) /E (\\e\\ 2 ): In this part we observe the effect 
of SNR = E(||x|| 2 ) /E(||e|| 2 ) on the optimality of T opt . 
Fig. [3] is given to discuss this effect. Similar to the setup of 
top-left panel, the simulations are performed with ~E X and 
S e having uniformly distributed eigenvalues. As expected, 
the performance at optimality improves with increasing SNR 
since it gets easier to differentiate zrj from zi in that case. 

P e vs. rrv. In this case, we study the effects of the amount 
of partial information shared by the detector side on the 
bound on the expected performance of the detector. This 
case is studied for H x and S e having uniformly distributed 
eigenvalues and SNR = 1. For n = 50, we construct T opt 
for particular values of m and evaluate its performance in the 



Fig. 4. Performance of Topt vs. m (length of partial information) 

P e vs. rv. In this part we study the effect of changes in 
signal length on the performance of T opt . The simulation 
results, for various Ti x and S e all having uniformly dis- 
tributed eigenvalues, are shown in Fig. At first glance, the 
results might seem counter-intuitive. The crucial point is that 
since m (the dimension of the partial information space) is 
constant, as n increases we get more degrees of freedom to 
construct T opt (i.e. the number of eigenvalues of P increases 
and so does ( 14.61 ). improving the detector performance). 

VI. Conclusions 

We introduce the concept of communication with partial 
information. The main idea is that the codebooks used by 
the transmitter and the receiver are different. This concept is 



The Effect of Changing Partial Information Length on the Performance 
n = TT, SNR = T 



P for) =) 




Fig. 5. Performance of T pt vs. n (signal length) 



different from that of communication with side information, 
where the utilized codebooks are the same but there is extra 
information available to one of the communicating parties. 

Within the context of communication with partial infor- 
mation, we particularly concentrate on a binary detection 
theoretic scenario. The transmitter sends one of the two 
codewords (which are independent realizations of a colored 
multivariate Gaussian distribution) to the additive colored 
Gaussian noise channel. The receiver acts as a detector, using 
dimensionality reduced versions of the encoder codewords, 
where the dimensionality reduction is achieved via a linear 
transform. We first find the optimal (in the sense of prob- 
ability of error) detection rule. Then we derive the optimal 
class of linear transforms in the sense of the expected value 
of the Chernoff bound on the conditional probability of error 
of the detector. 

Although the focus here is on binary detection, we believe 
that the proposed "communication with partial information" 
covers several setups of interest, especially the cases where 
there is an inherent asymmetry between the transmitter and 
the receiver due to the unbalanced limitations on the physical 
resources, such as memory and computational power. In our 
future research, we plan to explore various communication 
theoretic setups where asymmetry is a crucial feature. 

Appendix I 
Proof of Theorem 13. II 

Throughout the proof, we use the definitions of y, = x^+e 
for i G {0,1}. Accordingly, we use fJ> Vi \ Zi — E(yi\zi) and 



Cov {yi\zi). We start with the following lemma. 



Lemma 1.1: For i G {0, 1}, conditioned on z,, y^ is a 
normal random vector. Furthermore 



S e - E x T r (T£ 1B T r )- :l T£ a 



(1.1) 



is independent of i and positive definite. 

Proof: The crucial point is to show that, for i G 
{0, 1}, y, and Zj are jointly normal with a positive definite 



covariance matrix. First, consider 



e 



p 2n 



Since x. 



and e are both normal and are independent, they are also 



jointly normal with zero mean and the covariance matrix 



of H 











p2nx2n 



. Note that, H is clearly 



positive definite, since for any v = 



vi 
v 2 



where 



Vi,v 2 G K n , v'Hv = v'fS^v, + v^£ e v 2 > by 
the positive definiteness of S x and S e (that we assumed). 
By the same token, [vf S a vi + v^£ e v 2 = 0] ^==> 
[vi = V2 = 0] <=> [v = 0], yielding the positive defi- 
niteness of H. 

Now, consider the linear transformation from the normal 



random vector 



e 



represented by F 



r 



T, 
T 



l 2n to the vector 
I„ 

Om v i 



Yi 
z, 



pn+m 



G R(»+m)x2n where 



denotes the m x n zero matrix. This linear transform 



establishes the normality of 



z, 



pn+m 



(by the proper- 



ties of jointly normal random vectors) with zero mean and the 

^x + £e 



covariance matrix of FHF T = 



TS X TT, X T T 
of this covariance matrix, 



To deduce the positive definiteness 
i.e., FHF T , it is sufficient to show that F is full rank. This 
stems from the fact that if F is full rank (i.e., if r (F) = m+n 
since to < n), for any nonzero vector s G R" l+n we have 
F T s = w/ 0£ R 2 ™ since F T has a trivial null-space, so 
we end-up with s T FHF T s = w T Hw > by the positive 
definiteness of H. 

To establish the full-rank property of F (equivalent to hav- 
ing "F T has a trivial null-space"), consider a = &1 

L &2 

»™+« where a_i G l n and a 2 G R m . In this case, F T a 



ai 



a 2 



ai 



Suppose there exists some a ^ such that 



F T a = 0. This implies, aj = and T T a 2 = 0. However, 
since r(T) = to, [T T a 2 = 0] [a 2 = 0]. Therefore, 

[F T a = 0] [a = 0] and hence contradiction. Thus, F 

is necessarily full-rank implying positive-definiteness of the 



covariance matrix of 



z, 



Finally, the normality of 



i.e., FHF 3 



yi | Zj] follows from the prop- 
erties of normal distributed random variables. The positive 
definiteness of the corresponding covariance matrix ^ yi \ Zi = 
£ T + E e - S X T T (TS X T T )~ 1 TS :E follows from the fact 
that it is the inverse of a principal submatrix of the inverse 
of FHF T , which is positive definite (see (7.1.2) and (7.7.5) 
in [11]). Also, ^ yi \ Zi is clearly independent of i G {0,1}. 

■ 

Per Lemma ITTT1 since ^ y \ z = ^ yi \zi is positive definite, 
it is invertible and it has an invertible square root. 

Remark 1.1: From properties of normal random vectors, 
we have 



E (y 4 | zO = S.T^TS.T 7 )-^,: 



(1.2) 



Now let / 3=[(2 7 r)"/ 2 det(S,| z ) 1 /2]-i, 0=( Mj/ok 

Avi*i) Ts ^ y ' and a * = (y - ^y^V^Ky - ha* 



A 



K i = ^y.U ^y^V^W for * = 0) 1". where S y|z and Mj/iki are 

given in dl. II ) and ( II. 2k respectively. Then, using Lemma [TX1 
and Remark 1X7X1 given y is observed we have 



p(yiN 



/3 exp 



' 2 



(1.3) 



Then, using the above distribution of [y,; | z.;] the maximum 
likelihood detection rule ( 13.21 ) can be written as 



(3 exp 



^ /3exp 



Ol 

2 



which is equivalent to (13.3b since det (E^u) 7^ and exp(.) 
is a strictly increasing function in its argument. Moreover, 



P. 



e\H 



= Pr 
= Pr 



y ~^(/* Vo |» a ,E y |,) 



< 



K - Kl 



y -7V(M yo | 2o ,S y | 2 ) 



where P e |jy denotes the probability of error conditioned on 
-Ho- Here, conditioned on Hp, the random variable 9 is nor- 
mal since (/J- yo \ Zo — f J - yi \z 1 ) ^y\ z is a linear transformation 
from W 1 to M and y\Ho is normal. Conditioned on Ho, the 
mean and variance of 9 are given by 



H6\H 
2 

'|H 



^y|z (^yoNo — ^yiNi/ 



Then, Pr [error | Ho] is given in terms of the standard Q- 
function. As a result, after some algebraic manipulations we 
get 



P. 



e\H 



-1/2 



Q 



a e\H„ 



2 / V 2 

Furthermore, from symmetry, we have -P e | Zo , Zl = P e \H - D 

Appendix II 
Proof of Proposition |4TJ 



First, we recall the standard Chernoff bound on Q ( • ) 
function: Q (x) < A exp (-^) for z > [12]. Then, (@~J) 
is obvious via using it in (13 A\ . Next, we have 



Pe < E {zQjZl} 




(27r)^det(2S z ) 5 



exp 



"2 7 



(2S,)" 



A B 1 A 



7 *y, (11.7) 



where ( III.4l i follows from using ( 14.2b in ( 14.11 ). ( 111.51 ) fol- 
lows from using the definitions of S y | 2 , ^y |zo' MyoNo 



(cf. Theorem [3X1) and defining A = S I T T (TS X T T )~ 1 , 

B = S y 1 2 , 7 = zq — zi, dll.6t follows since the only source 
of randomness is due to 7 per our reparametrization, dll.71 ) 
follows since 7 ~ Af (0, 2S 2 ) where S 2 = Cov(zq) = 
Cov(zi) = TS X T T . 

Next, we proceed by showing the positive definiteness of 



the matrix 
that it is a 



A 1 B 



which would ensure 



valid covariance matrix. First, by assumption, 
H x is positive definite and T is full-rank. Hence, using 
similar steps to the ones that are used in the proof of positive 
definiteness of FHF T within the proof of Lemma 11.11 
we conclude that S 2 = TH X T T is positive definite. Fur- 
thermore, [S z = TE X T T > 0] [2£ 2 > 0] <^ 
(2S 2 ) _1 > . Next, note that A is full-rank using straight- 
forward linear algebra. Using this result and the positive defi- 
niteness of B, and applying similar arguments to those above, 
we conclude that A T B _1 A is positive definite as well. 
1 , a t b- 1 a]^' 



Thus, 



(2S 2 



is positive definite since it 



is the inverse of the sum of two positive definite matrices, 
which is itself positive definite. As a result, the quantity 

' A B A is a valid covariance matrix and 



the integral (III. 7b converges, yielding 



P e < \ < -V-\. 



S 2 A T B-!A 



by properties of determinants; hence the proof. 

Appendix III 
Proof of Propositions. 21 



□ 



Our first goal is to show that T* 6 St- First, note that, 
T* is a in x n matrix by construction. Next, observe that, 
by definition ED is a to x to, non-singular matrix and 
Up A _1 F T is a n x ji, non-singular matrix. Furthermore, 
M* is of size n x m and r (M*) = m, i.e., it is full-rank 
by definition. Hence, T* is also of rank-m, implying that 
T* e St. Next, using T, x = FA 2 F T and the definition of 
T*, after some algebraic manipulations we get 



FAUpM* = (T*) T ED-\ 



ED E 



2-rriT 



T*S X (T*) J 



(III.8) 
(111.9) 



Lemma 3.1: For any M e Sm, letting T = 
EDM T UjA- 1 F T , where E e M" ,xra is an arbitrary 
unitary matrix and D e ]R mxm is an arbitrary diagonal 
positive-definite matrix, we have G (M) = J (T). 
Proof: We have 



G (M) 



2 
det 



det 



(m t a p m) 



,(111.10) 



-edm t uJa- 1 f t faaf t 



S e - SJ' J ED 2 E T T£ 



S T T' J 'ED" 2 E T 



(in. 11) 



J(T) 



(III. 12) 



where (MI. 10b follows from the definition of determinant 
and properties of positive definite matrices; (IIII. 1 11 1 follows 
from our auxiliary definitions, properties of the defined 
matrices and the matrix inversion lemma; dill. 121 follows 
from the substitution of the auxiliary matrices in ( IIII. lib . ■ 

Lemma 3.2: For any T S St, there exists M G Sm., such 
that J (T) = G (M). 

Proof: For any T S 5t, let E and D be given by 
the SVD of TS^T 7 , i.e., TS^T 11 = ED 2 E T . Naturally, 
E e R mxm and D e R mxm are unitary and positive-definite 
diagonal, respectively. Then let M = UjAF T T T ED" 1 . 
First, we show that M G S M . Clearly, M G R nxm . Here, 

M T M = D- 1 E T TFAU p UjAF T T T ED" 1 

= D~ 1 E T TFA 2 F T T T ED~ 1 

= D^^TE^ED" 1 , 

= D 1 E T ED 2 E T ED 1 = I m , 

implying M G Sm- Now, note that we have T = 
EDM T UjA _1 F T due to the way M was defined; this 
means T is of the functional form given in the statement of 
Prop. 14.21 Also, E is unitary and D is diagonal and positive 
definite. Therefore, we necessarily have G (M) = J (T) per 
Lemma [3~T1 Hence the proof. ■ 
Now, we go back to the proof of Prop. 14.21 and use proof 
by contradiction. Suppose, there exists some T G <5>t such 
that J (T) > J(T*). By Lemma [XT] we necessarily have 
J (T*) = G (M*). Furthermore, by Lemma[321 there exists 
M G S M such that J (T) = G (M) . But this implies 
G (M) = J(T) > J(T*) = G (M*) which contradicts 
with the way M* was defined in the first place. Hence 
contradiction and proof. □ 

Appendix IV 
Proof of Proposition s. 31 

First, observe that, G (M) is a product of positive real 
numbers since Xi (m t A p m) > for all i by the positive 

definiteness of M T A p M (because M is orthonormal, full- 
rank and A p is positive definite). So, in order to maximize 
G (M), we follow the strategy of maximizing each posi- 



tive factor I 1 H -, — ^ r I for all i, which clearly is 

V Ai(M T A p M) J J 

equivalent to minimizing Xi ^M T A p M^ for all i. Here, let 
Q G K mxm denote a permutation matrix such that A p = 

Q T ApQ, the matrix A p is diagonal, and its eigenvalues (i.e., 
the diagonal entries) are in non-decreasing order 0. Then, 
G (M) can be rewritten as 



G (M) = 2~ m Yl 



X, (m t q t a p qm 



(IV. 13) 



Next, we recall the Poincare sepe ration theorem (see [11], 
pp. 190-191) which is crucial in completing the proof. 

Theorem 4.1: Let A G M. nxn be symmetric, and let m be 
a given integer with 1 < m < n, and B m = U T AU, where 
U G M. nxrn is orthonormal. If eigenvalues of A and B m are 
arranged in non-decreasing order, we have 

Xi (A) < A, ; (B m ) < A i+ „_ m (A) i = 1,2, 77i (TV.14) 
Using dIV.141 ) in dIV.131 ). we get 



G (M) < 2- ,n Yl 



(IV. 15) 



Choosing QM = 
A l (M T Q T A P QM 



2 See [11] for the existence of such a Q. Note that, such a Q is unique 
iff the eigenvalues of A p are distinct. 



Im mx( „_ m )J clearly satisfies 

A; (A-p^J f° r 1 < 7 < 777, thereby 
achieving JIV. 15t with equality. Furthermore, since eigen- 
values are invariant under similarity transformations, for any 
unitary T G R mxm choosing QM = [T^ xm mx( „_ m) ] 
also satisfies dIV.15l l with equality. Also, the resulting 
M = Q T [r^ xm mx(n _ m )] clearly satisfies M G S M - 
Hence, any such M is a solution to d4.5t where the maximum 
value is the RHS of ( HvTSl i. □ 
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